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We obtain the optimal scheme for estimating unknown qubit mixed states when an arbitrary 
number A'' of identically prepared copies is available. We discuss the case of states in the whole 
Bloch sphere as well as the restricted situation where these states are known to lie on the equatorial 
plane. For the former case we obtain that the optimal measurement does not depend on the prior 
probability distribution provided it is isotropic. Although the equatorial-plane case does not have 
this property for arbitrary A'^, we give a prior-independent scheme which becomes optimal in the 
asymptotic limit of large A'^. We compute the maximum mean fidelity in this asymptotic regime for 
the two cases. We show that within the pointwise estimation approach these limits can be obtained 
in a rather easy and rapid way. This derivation is based on heuristic arguments that are made 
rigorous by using van Trees inequalities. The interrelation between the estimation of the purity and 
the direction of the state is also discussed. In the general case we show that they correspond to 
independent estimations whereas for the equatorial-plane states this is only true asymptotically. 

PACS numbers: 03.67.Hk, 03.65.Ta 



I. INTRODUCTION 

Two-state systems or qubits are the building blocks of 
many applications in Quantum Information. Although 
they are commonly assumed to be in pure states, in real 
situations they are not. State preparation, processing, 
quantum channels, etc. are inevitably imperfect, which 
means that any quantum system is, in fact, in a mixed 
state. The accurate estimation of the parameters that 
characterize qubit mixed states is therefore of utmost rel- 
evance for practical applications. The aim of this work 
is to find the optimal (most accurate) scheme to perform 
this task. 

So far, most of the work in state estimation has focused 
on pure qubit states [1-3] and fewer quantitative results 
have been obtained for qubit mixed states [4-9] . One ob- 
vious reason for this is the greater complexity of the esti- 
mation procedure. Whereas pure states are fully charac- 
terized by just two parameters — those specifying a point 
on the surface of the Bloch sphere, i.e., a unit vector — 
for a mixed state an additional parameter is required to 
specify its purity, by which we mean the distance from 
the center of the Bloch sphere to the point that represents 
the state. This brings a theoretical subtlety: we will need 
to identify a uniform prior distribution for the purity. In 
contrast to the pure-state case where there is a "natural" 
uniform probability distribution — the invariant measure 
on the 2-sphere — , for mixed states there is no unique 
choice. A uniform distribution must be isotropic (invari- 
ant under rotations of the Bloch sphere), but the purity, 
which is itself invariant, can be distributed according to 
a whole class of functions [10, 11], depending on several 
criteria. Despite this ambiguity, our results turn out to 
be rather general and, in particular, they do not depend 
on the specific choice of an isotropic purity prior. 



In this paper, we assume that we have N identically 
prepared systems upon which we can perform general- 
ized measurements. From their outcomes we can infer 
the value of the parameters that characterize the state of 
the systems. The quality or accuracy of the estimation is 
quantified by the fidelity (to be defined in the next sec- 
tion). The average of the fidelity over the prior and the 
outcome distribution provides a useful summary param- 
eter of the overall quality of the estimation scheme. This 
problem was partially addressed in [5]. Here we present 
an alternative formulation that enables us to apply the 
approach to new, practically relevant situations and find 
many explicit results. 

To be more specific, we will study two types of sit- 
uation: that of estimating an a priori completely un- 
known qubit state and that of estimating a state that is 
known to lie on an equatorial plane of the Bloch sphere. 
We call the former the 3D case (or just 3D for short), 
as the state can be represented by any point in the 3- 
dimensional Bloch sphere. By the same logic, we call the 
latter 2D. The 2D case is useful because in many applica- 
tions quantum states can be parametrized by the purity 
and a phase; e.g., linearly polarized photons. The 2D case 
also exhibits some remarkable theoretical features. For 
instance, we will show that while for 3D states the opti- 
mal measurement is essentially unique, independently of 
the isotropic prior, this is not so for 2D states, though this 
feature is recovered in the asymptotic limit of large N. 

We will first address the problem from a Bayesian point 
of view, which will provide explicit results for any fi- 
nite N. We will also take a steep dive into the asymp- 
totic regime of the estimation schemes. It is clear that 
unknown states can only be estimated with perfect ac- 
curacy in the limit N ^ oo. The rate at which this per- 
fect determination limit is achieved as N increases is a 
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very informative parameter. It is useful, e.g., to compare 
different estimation schemes. If two schemes have the 
same rate, we say that they are (asymptotically) equiva- 
lent. The asymptotic behavior is also a central notion in 
statistics, where there exists a wealth of results and very 
powerful techniques [12, 13]. 

Within the statistical framework, one optimizes over 

all measurements and estimators, when the signal state 
is taken to be fixed. It turns out, under regularity condi- 
tions, that the maximum likelihood estimator is asymp- 
totically optimal whatever the true signal state. The 
mean square error of the estimator gives a measure of 
the quality of the scheme. This error can be related to the 
fidelity through the Fisher information matrix, thus pro- 
viding a connection with the Bayesian approach. In this 
context, the prior distribution plays a very minor role. In 
contrast, within the Bayesian approach the prior distri- 
bution does play a significant role because, as mentioned 
above, one is interested in obtaining an estimation that 
is optimal on average. 

Here we present in a fairly comprehensive way the ap- 
plication of the two approaches to the asymptotic be- 
havior of qubit mixed state estimation. We will see that 
both yield the same results. This fact has important con- 
sequences. It tells us that the asymptotic behaviour of 
the optimal mean fidelity only depends on the prior as an 
average of the optimal pointwise (i.e., for a fixed state) 
fidelities. Second, the Bayesian approach provides an ex- 
plicit scheme that attains the pointwise bounds. It is 
worth pointing out that for some restricted schemes and 
some priors this might not be the case. For instance, it 
is known that a scheme based on fixed local measure- 
ments with the Bures prior distribution [14] does not 
approach imity at a rate [8], as a pointwise ap- 

proach would indicate. Even more surprising, in this 
situation the Bayesian and the Maximum Likelihood es- 
timation give different asymptotic average fidelities [8], 
in contrast to the common lore that both estimators 
should be asymptotically equivalent, pointwise. The non- 
equivalences here do all have simple explanations. Point- 
wise, everything is asymptotically equivalent and does 
converge at rate However, the convergence is not 

uniform or the integrated coefficient of 1/N diverges. 

This paper is organised as follows. In the next sec- 
tion we introduce the notation and main concepts that 
will be used throughout this work. In Sec. HI we ob- 
tain the optimal estimation protocol for any number of 
copies of the state in both the 3D and the 2D cases. In 
Sees. IV and V we compute the asymptotic expression 
of the fidelity from both the Bayesian and the pointwise 
approaches, respectively. The derivation of the latter is 
done through a rather self-contained presentation since 
some of the techniques may not be so well known among 
physicists. In Sec. VI we summarise our main results. We 
have relegated many technical details to the appendices 
for the benefit of readers not interested in technicalities 



II. PRELIMINARIES 

Consider an ensemble of N identically prepared states 
[p{f)]^-'^ , where p{r) is a. density matrix with Bloch rep- 
resentation given by 

Pi^ = (2.1) 

Here a = (ct^, cr'', tr^), where cr", a = x,y,z, are the 
usual Pauli matrices and r is a point in the Bloch sphere 
{r : |r| < 1}. We will drop r and write simply p where 
no ambiguity arises. 

A measurement on p^^ is represented by a Positive 
Operator Valued Measure (POVM). It is defined by a set 
O = {O^} oi positive operators such that 

^0^ = 1, (2.2) 

X 

where x refers to the various outcomes that can occur. 
It can be a discrete or a continuous variable. 

In order to estimate p we proceed as follows. We first 
perform a measurement on p*^^, from which we obtain 
an outcome %• Based on x, an estimate for p can be 
guessed: p^. Its quality is quantified by the fidelity, de- 
fined as [14] 

f{f,R^)= [tT^^p^y , (2.3) 

which determines the maximum distinguishability be- 
tween p and p^ that can be achieved by any measure- 
ment [15]. For qubits, Eq. (2.3) reads 

l + f-R^ + VT^Jl - El 
f{r,Rx) = ^ ^- , (2.4) 

where f and R-^ are the Bloch vectors of the states p and 
p^ respectively, r = |r| and R = \R\. 

In the Bayesian approach the overall performance of 
the estimation procedure is quantified by the average fi- 
delity F, hereafter fidelity in short. It is the average 
of (2.3) over the prior probability distribution, which we 
denote dp, and over all possible outcomes x of a given 
measurement, namely 

F = Y^[ dpf{f,R^)p{xn (2.5) 

where p(xl^) is the conditional probability of obtaining 
outcome x given that the signal state has Bloch vec- 
tor r. These probabilities are determined by the expec- 
tation values of the positive operators O^, i.e., p(xlr) — 
tr [Oj(./9]. Our aim is to maximize (2.5). 

For a given measurement O, there always exists an op- 
timal guess or estimator. To prove this, we first introduce 
the four dimensional Euclidean vector 

r = {r'^,r'',r\T'') = (r°,r) = (\/l-r2,f). (2.6) 
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Note that r-r' = r'^r"^ + f- f' and |r| = -^r • r = 1. With 
this, the average fidehty reads 



r R, 



■P{x\r), 



(2.7) 



where R;^ = {R^,R-^) is defined in analogy to (2.6). 
A straightforward use of the Schwarz inequality gives an 
upper bound of F that is saturated with the choice 



R. 



dpvpix\r), (2.8) 



Using (2.8), the maximum fidelity is 




(1 + A). (2.9) 



Since the guess (2.8) satisfies |R^| = 1 and its first 
component is non-negative, it always gives a physical 
state. In fact (2.8) is the best state that can be inferred 
and (2.9) is the maximum fidelity that can be obtained 
given O and the prior dp. 

In the analysis below, it will prove very convenient 
to block-diagonalize p"^^ by writing it in the basis of 
the SU(2) invariant subspaces of (|)®^ [we use bold- 
faced integers and half-integers to denote the irreducible 
representations of SU(2)], which are also invariant under 
the action of the symmetric group S'jv (See App. A and 
also [4, 5] for details). In contrast with pure states, for 
which p®"'^ has projection only in the symmetric (A^ + 1)- 
dimensional subspace of J = ^ , for mixed states p®^ has 
also components in all the lower- dimensional invariant 
subspaces, which, furthermore, occur with multiplicity, 
Uj, greater than one. We thus write 



N/2 

p^N^ njpN,, 

j=0,l/2 



(2.10) 



where the lower limit in the direct sum is for even N 
and 1/2 for odd N, 



Tin 



N 



2j + 1 



N/2- J J iV/2 + j + l 



and 



with 



PNj 



Pj, 



(2.11) 



(2.12) 



Pj 



j-m 



1 



2 J V 2 
U{n)\jm){j'm\U^ (n). 



j+m 



(2.13) 



Throughout this paper U{n) denotes the SU(2) imitary 
representation of the rotation TZ{n) that takes the unit 



vector z (pointing along the 2;-axis) into n = f/r on the 
Bloch sphere. Recall that 



(jm|C/(n)|jm') 



(2.14) 



defines the standard Wigner matrices [16]. Notice that 
Pj are not proper density matrices, since ti pj ^ 1. 

For 2D states, the Bloch vector r of the state p lies 
on the equatorial xt/-plane of the Bloch sphere, i.e., 
r = r(cos0, sin6', 0). We are still entitled to use the de- 
composition of p®^ above, but now we write 



Pj = Yl 



1 



j-m 



1 + r 



j+m 



U{e)U{x) \jm){jm\U\x)U\e), (2.15) 

where x is the unit vector pointing along the x-axis and 
U{9) is a unitary representation of a rotation of angle 9 
around the z-axis. Note that U(x)\jm) is an eigenstate 
oi X ■ J (i.e., of the projection of the total spin operator 
J along the .;:-axis), since U{x) takes z into x (i.e., is 
a rotation of angle 7r/2 around the y-axis). Hence, the 
Bloch vectors of the whole set of states {U {0)\JJ {x)\jm)]} 
lie on the xy-plane, as they should, and 6 is the angle 
between r and the x-axis. 

In the basis \jm) the transformation U{6) is diagonal, 
and substituting (2.15) in (2.12) we obtain 

PNj = E e'^"'^'pLm'\jm){jm'\, (2.16) 



where 



fLm' = Ed^L"(V2)d^L"('^/2) 



1 



N/2- 



1 + r 



N/2+m" 



(2.17) 



2 J V 2 y 

and d^^, are the (real) reduced Wigner matrices [16]. 



III. FINITE NUMBER OF COPIES. BAYESIAN 
ESTIMATOR 

In this section we obtain the optimal POVM and closed 
expressions of the fidelity for any number of copies of the 
signal state. Although the 3D and 2D cases look sim- 
ilar, we will show that there are remarkable differences 
between them. 



A. 3D states 

As mentioned in the introduction, we consider N iden- 
tical copies of a quantum state which is chosen according 
to an isotropic prior distribution 



dp = w{r) dr dn, 



(3.1) 
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where dn is the invariant measure on the 2-sphere 

d(cos 9) dcj) 



dn = 



(3.2) 



and w{r) is normaUzed such that drw{r) = 1. 

Lot us start by computing the optimal POVM. We 
first notice that because of the block-diagonal form of 
p^N (2.10) we may just consider also block-diagonal 
POVMs, of the form 
J 



e 

j=0 



such that 



1 



J' 



(3.3) 



with no loss of generality. Indeed, for any given POVM 
{O^}, we can always construct a new one, {Oy^ja}, 
through 



(3.4) 



where Ija is the identity in the j-rcpresentation sub- 
space and a (l < a < Hj) labels the different occurances 



of j in the Clebsch-Gordan scries of ( 



If F (F) 



stands for the maximum fidelity that can be attained us- 
ing {O-^} {{O^a}), we have F < F. This is readily seen 
by noticing that the probability p{x\r) = tr [p'^'^O^^] is 

the marginal of p{xja\r) = tv[p^^d^ja], i.e., p{x\r) = 
^^•^p(xiQ:|r), and no marginal can be more informative 
than the initial probability distribution. Moreover, be- 
cause of (2.10), if {O^ja} is to be optimal, we may obvi- 
ously replace O^ji, 6^,2, • • • , O^jn, by, say, O^ji, O^ji, 
. . . , O^ji without changing the fidelity, which leads us 
to (3.3). 

It is important to note that (3.4) allows us to view j 
and a as the outcome of the measurement {l^a}- There- 
fore, in Eq. (2.9) we will have T^j|Vj(-j| instead of |V;^|, 
and an additional summation over j. Hence, our goal is 
to maximize |V;^j| for all pairs {x,j), where 



XJ 



-I 



dprtvip^^O^j). 



(3.5) 



The j outcomes give information about the decomposi- 
tion of as a direct sum of SU(2) irreducible compo- 
nents. This, in turn, encodes information about r. For 
instance, if r = 1 (pure state), the probability of obtain- 
ing the outcome j = N/2 is unity. For our purposes, all 
the information concerning the purity of p comes from 
this source, as we now demonstrate. 

Since V^j is invariant under rotations, whereas V^j 
transforms as a 3-vector, we may apply to V^^j the rota- 
tion TZ~^(n^i) = Ti7{n-x^j), where fi-^j 



Vxill^xjl and 

obtain "V'-^j, such that its x- and y-components vanish, 
i.e., V'l^ = V'l- = and 



V 



V 



10 

XJ 



dp [n'in^.yy iv{p'^^o^,) 

j dpr cos0ti (p^^fi^j) , (3.6) 
J dpVl-rHv{p^'^n^j), (3.7) 



where we have defined 



(3.8) 



we have used that dp is rotationally invariant, and we 
have written r = rn in spherical coordinates, i.e., n = 
(sin0cos(/>, sin^sin^, cos^?). Therefore, |V^j| = |V';^j|, 
and the maximum fidelity can be computed using V'^ 
instead of V^. Hereafter, we drop the primes and write 



X3 XJ 

where V^j, V^j arc given by (3.6) and (3.7). 

Using Eqs. (2.12-2.14) and recalling that cos 61 = 

Dqo (n), we have 



^xj = [ drw{r)r E Pj^l^ 



mm m 



,0) 



1/0. = [\rw{r)VT^ E Pjm[nxj]m"m' 
Jo I II 



I 



(3.11) 



where the sum over the indexes m, m', m" runs from — j 
to j, and we have defined 



Pjr. 



i 2\ •^~3 /I \ j—m /i , \ j-\-m 



(3.12) 



The orthogonality relations of the irreducible represen- 
tations of SU(2) (Eqs. (4.6.1) and (4.6.2) on Page 62 of 
Ref. [16]) enable us to write 



Jo 
Jo 



dr 



w{r) r 



j{j + 



E mm'pjm [^xj]m'm' ' (^-l^) 



dr- 



j{r) \/l — r'^ 
d-i 



where dj = 2j -f- 1 is the dimension of the representation j 
of SU(2). We readily see that the z- and 0-components 
of Yy^j are bounded by 



I d, jXj + 1) 



/ drw{r) r 
Jo 



mpjr 



\V3A = C drw{r) v/T^E 



XJ I 



Pjm- 



,(3.15) 
(3.16) 



Note that all the x dependence has been factored out 

and A3D takes the form 



A3D<E 



■tli 



Evtr^^xi 



.Ji!^^?^^^, (3.17) 
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where w° and Vj can be easily worked out from (3.15) 
and (3.16) to be 



v: = I dr— Y 2^ mpjm, 



J 



(3.18) 



m=-j 



/ drw{r)Vl-r^ V p.m. (3.19) 
^0 Jrl, 



Eq. (3.8) clearly implies that the factor in parentheses 
in (3.17) is unity. Notice that the x dependence has 
entirely disappeared in the final bound of the fidelity. 

Inequality (3.17) is saturated iff the only non-vanishing 
term of the sum over m' in (3.13) corresponds to the 
maximum value of \m'\, namely, j. This implies that 
i^xArn'm' ^ ('-"' ^^'^ trivial symmetric choice Sm'-j). 
An obvious choice that satisfies this condition — and is 
independent of % — is 

flj=d,\j3){j3\. (3.20) 

The operator flj is a seed of a continuous covariant 
POVM, i.e.. 



(3.21) 



where p, plays the role of x- It can be easily verified that, 
/ dfjiOfij = Ij [1], where (as dn) is the invariant 
measure over the 2-sphcrc. This proves that the bound 
is attainable. POVMs with a finite number of outcomes 
can also be obtained using the results in [17]. 

Having obtained the optimal POVM, Eq. (3.21), it is 
straightforward to compute the conditional probabilities 



tr {p^^Opj)=d, 



2 \ J-j 



1 + r- 



2j 



, (3.22) 



which will be needed in Sec. V. One can check that 

Yrij fdfxtv{p^''Opj)=l, (3.23) 

as it should be. The corresponding guesses can be worked 
out from (3.5) by simply substituting p for %. One can 
also verify that the angular integration indeed yields the 
two terms (3.18) and (3.19). 

In summary, the fidelity of any optimal POVM can be 
written as 



however one can derive a compact formiila for the fidelity 
in terms of the mean value of r: (r) = dr w{r)r. This 
will be done in Sees. IV and V. 

Several comments are in order here. Within an optimal 
scheme, the purity estimator. 



(3.25) 



only depends on j and comes solely from the measure- 
ment represented by the POVM {Ija} [18]. All depen- 
dence on any other kind of outcome, generically referred 
to as X [e.g., p in Eq. (3.21)], has disappeared. This 
is expected from symmetry grounds: the parameter r 
does not change under SU(2) transformations and the 
optimal purity guess must thus be a function oi j/N, as 
the only SU(2)-invariant quantity in this problem is pre- 
cisely j. Furthermore, since this measurement ({Ij^}) 
does not alter (on average) the estimation of the orienta- 
tion n = r/r oi the signal state, the optimal estimation 
in the sense of average fidelity of (a priori) isotropically 
distributed mixed states breaks into two independent es- 
timations: that of the purity r and that of the orientation 
n in the Bloch sphere. Notice finally that after this mea- 
surement, the rest of the protocol, which involves the 
POVM (3.21) for a fixed j (or any version of it with 
a finite number of outcomes), is identical to the optimal 
protocol for estimating a pure state \n) given 2j identical 
copies of it [2]. 



B. 2D states 

In the situation we are about to consider, Vj^-j, defined 
by (3.5), still determines the maximum fidelity through 
Eq. (2.9), but dp is 



dp = w{r) dr 



dB 
2^ 



(3.26) 



with drw{r) = 1. Since r is a 2-dimensional vector, 
we can use a complex notation and write r re*^. In 
this notation V^j and Ry^j also become complex numbers. 
More specifically, 

Kj = E /' w{r)Vl^l4nmOiim, (3.27) 

J where wc have raised the outcome labels x and j in 

Ago = Y nj J{v°f + {v^'-f. (3.24) ["^ P'■mv^' > Eq. (2.17)] to avoid a confusing proliferation 

^ M ^ 1 of subindexes; the latter will label matrix elements, e.g., 

^mm' ~ (i"^l*-'xjb'^')- Similarly, we have 



This equation along with (3.18) and (3.19), provide a 
general expression of the maximum fidelity for any given 
prior distribution w{r). Unless an explicit expression for 
w{r) is given, this is as far as we can get. In App. C we 
present closed expressions of the fidelity for arbitrary A'^ 
using the Bures prior. In the asymptotic limit N oo 



IK 



[ dprY e^^^-^'+i^V ,0^^; 

mm' 

< dr w(r) r 



mm-\-l 



QX3 

^m+lm 



, (3.28) 
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where we have used that p^^+i > for all r. The 
equality in (3.28) is attained by choosing the phase of 

^m+i m to be independent of m. 
The positivity of implies that 



\qX3 I <- 



+lm+l- 



(3.29) 



By choosing to take its maximum value 

in (3.29) we ensure that will also be maximal. So 

far, the optimization of V^,^ and \y-^j\ can be carried 
out independently of one another, since the choices we 
have to make in order to saturate the bounds in (3.28) 
and (3.29) do not affect V^j- However, we will have to 
check that they are compatible with the POVM condition 
Sy^xi = ■ We will verify this by giving an explicit 
POVM that meets all the above conditions. 

We now replace O^j by its covariant version O^j^, 
defined in (D3) in Appendix D we show that this 
change does not affect the average fidelity — and take 
the seed (positive) operator fi^j in (Dl) to be given by 



^xj = (i-®-! to be rank one), where 

=Z1"™I•^'''^)• 



(3.30) 



The components are taken to be real and must satify 

\2 



as follows from 



i{m-m )4>..xj.,X3 
mm' ^ "m "m' • 



XJ4 



(3.31) 



(3.32) 



It is important to realize that the vanishing of the off- 
diagonal elements in Jq^ d<l)/{2TT) O^^, = V does 
not require further conditions on v/^. Moreover, 



f)X3<P 
m+l m 



J<PniX3 „,X3 



= ^'"^l&sjo^J.Um^,, (3.33) 

hence, this choice saturates both (3.28) and (3.29). 

Collecting all the pieces and defining A2D = '^j^j^ 
[recall that F = (1 + A)/2)], we see that the maximum 
fidelity is given by the maximum value of 



, 2D 



+ 




m „.xj.,xj 

l-'m"'m "m+l 



1/2 



(3.34) 



where ul^ is constrained by (3.31) and and can 
be read off from (3.27) and (3.28) respectively: 



^m = / drw{r)\/l — r'^fp^ 
Jo 

PL = drw{r)rp'„ 
Jo 



mm+l' 



(3.35) 
(3.36) 



With no loss of generality we can take the index x 
in (3.34) to be integer and its maximum value to be less or 
equal than the number of distinct values of in (3.35). 
The symmetry relation d^^/^i = d_^, _^ further implies 
that X < [djf^], where [. . . ] stands for integer part. With 
all the above, maximizing A2D5 which can be done for 
each j independently, becomes a straightforward task. 

The results of the 3D case may lead us to believe that 
the optimal POVM will be independent of the prior w{r). 
The inspection of the low N cases gives further support 
to this belief. For j < 5/2 (iV < 5) one can show that 
the optimal POVM is given by 



1 



(3.37) 



for any prior w{r), where we have dropped the index x 
because it only takes one value herc.^ However, one can 
check that for j > 3 the choice (3.37) is not optimal for 
some priors. Take for instance iV = 6 and consider a 
prior of the form w{r) = {2r/S^)Q{S — r), where Q{x) is 
the step function [i.e., 6(x) = 1 for x > and 6(x) = 
otherwise] and (5 is a positive number. If S is sufficiently 
small, one can Taylor-expand A3 about 6 — and eas- 
ily obtain the optimal solution at leading order, which 
does not turn out to be of the form (3.37). A straight- 
forward computation yields (A°p* - Ag^^-^^-^^^A^P* = 
A6'^ + 0{6^), where A is a constant that can be com- 
puted analytically {A « 1.0 x 10~^). 

In spite of this unexpected dependence on the prior in 
the 2D case, there are, however, two features in the exam- 
ple above that are completely general: (a) the difference 
^opt _ ^Eq.(3.37) Q]^g^yg ygj-y gniall, aud (b) A°P*' is 

actually different from only for priors that are 

very peaked about r = 0. There is a further, very impor- 
tant property: the POVM defined by (3.37) is asymptot- 
ically optimal (the proof is given in Appendix H). Hence, 
for practical purposes, the best one can do is to stick to 
the choice (3.37), for all j and m, regardless the prior 
knowledge one may have of p. Though this choice does 
not guarantee optimality for small TV, it does guarantee 
that the corresponding fidelity will differ from the maxi- 
mum one by a tiny amount (typically less than only one 
part in a thousand) and, furthermore, that this difference 
will decrease to zero as AT —> 00. 

The asymptotically optimal choice (3.37) amounts to 
replacing O-^j by 



where $7, 



\Uj)(Ujl \Uj 



drop the superindex "Eq. (3.37)" in A, Aj, etc 

A2D =^nj^{v^)'+{v-y, (3.39) 



(3.38) 

b"^)! ^-nd [hereafter we 
2 



1 There are also degenerate solutions of the form it^ 
m, and with ^X3 ~ ^ 
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where 



v'^ = J2<' «J = E/34, (3.40) 



We start by noticing that the cocfRcicnts rJ^, defined 
in Eq. (3.42), satisfy cLm = —<4n (which imphes = 0) 
and, hence, 



and the analogy with (3.24) is apparent. 

We next recall (3.35), which involves ti pNj- Since the 
trace is invariant under rotations, Vj can be straight- 
forwardly computed using (2.12) and (2.13). No such 
simplification exists for v^, as far as we are aware. Pro- 
ceeding this way we have 

3 /■! 



= 2 E / drwir)(^ 

m = — T ^ 



J-m+i 



1 + r 



""1 = E '^rn drrw{r)[^^-^ 



m=-j 



J—m 



1 + r 



J+m 



(3.41) 



where the coefficients are given by 

4n= E d^L(^/2)d^^+i™('r/2), (3.42) 

m'=-j 

as can be read off from (2.17). The sum over m in Vj can 
be easily performed, since it is just the sum of a geometric 
series, and yields 



= 2 /_ dr—-{ ( ^ 



r [ 



J-j+i 



-{r^-r)}. (3.43) 



The sum over m in wj, however, is non trivial because of 
the coefficients and no simple closed formula can be 
found but in the asymptotic limit N ^ oo. 



IV. ASYMPTOTICS: BAYESIAN APPROACH 

In this section we calculate the asymptotic (large N) 
expressions of the fidelities obtained in the previous sec- 
tions using the Bayesian approach. For 2D they are sum- 
marized in (3.39), with the definitions (3.41), (3.42) and 
the relation (3.43). For 3D the maximum fidelity is given 
by (3.24), which involves the definitions (3.18) and (3.19). 
We here present a detailed computation only for 2D. The 
3D case can be computed in a similar way and we just 
point out the main differences with 2D. For simplicity we 
consider an even number of copies N = 2n, thus J = n. 



= E^/ drrw{r)}^(^^-^ 



1 + r 



n+m 



(r — > — r) 



(4.1) 



We further note that the dominant contribution to the 
sum in comes from the region where m is close to 
its maximum value j. We can thus replace by the 
first terms of its "Taylor expansion" about m = j. It 
turns out that only the first two terms, « a j + bj (m — 
j), contribute at the order we are interested in. The 
coefficients aj and bj are computed in Appendix E. After 
substituting Eq. (E6) in (4.1) the sum over m gives: 



Jo 



\r^ 



1 + r 



1 

n+j+l 



1 



(r —r) 



(4.2) 



where we have dropped terms that fall off exponentially 

as n goes to infinity. It is convenient to combine and 
with the binomial in rij [see Eq. (2.11)] and define v° 



and i)J as 



2n 
^- J 



2n 
n- j 



(4.3) 



With this, Eq. (3.39) becomes 



A2D=E 



n+j + 



Our goal is to compute the asymptotic behaviour of the 
above sum. We do so by first computing the leading or- 
der contribution: lim„^oo A- We, of course, expect this 
to be unity, as the optimal guess must certainly lead to 
a perfect estimation given infinitely many copies. The 
calculation thus provides a consistency check of the ap- 
proach and, moreover, the leading order expression of 
and uj, which will be later used to compute the next- to- 
leading order contribution. 

At leading order in 1 /n, we are entitled to use the well 
known result 



2n 
k 



q\l-q) 



exp 



2n-k 



,(fa)! 

■/(!-</) 



2^J■Knq{l - q) 



(4.5) 



which holds for large n. In our case k = n — j and q = 
(1 — r)/2. Furthermore, we can approximate the gaussian 
in (4.5) by the Dirac delta function 6{k — 2nq) = 5(nr — 
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j) = S{r — j/n)/n. After a straightforward calculation 
we end up with 



" ^«'(s)(i + s) + ^(V«), 



where s = j/n. 

Recalling the derivation of Eq. (3.39), we see that the 
optimal guess for the purity only depends on j and is 
given by 



(4.7) 



in full analogy with (3.25). [The optimal guess for 6 is 
given by Eq. (3.38).] One readily obtains 



as expected. Similarly, it also follows from (4.6) that 



(1/n). (4.9) 



At leading order the sum over j in (4.4) can be replaced 

by n ds, and dj/{n + j + l) « 2j/{n + j) = 2s/(l + s). 
Hence, at leading order 



A2D = / dsw{s) = 1, 
Jo 



(4.10) 



and, as it should be, lim^r^oo F = 1 for any prior. 

We arc now ready to compute the fidelity to next-to- 
leading order. The calculation can be greatly simplified 
by noticing that 

n , 



J=0 



for all such that < < 1 [this is, in reverse, the 
same argument that took us from (2.7) to (2.9)]. The 
bound is saturated iff 

{y/T^l^j)<x{vlv]) (4.12) 

for all j, namely, iff S^j = Rj. With the leading order 
choice = j/n, Eq. (4.11) provides a tight bound at 
order o(l/n). At next-to-leading order we thus have 




where we have "linearized" the square root in (4.4), hence 

overcoming in a very simple way the most demanding 
part of the calculation. We can now use the techniques 



in Appendix F to evaluate the asymptotic value of this 
sum. We obtain 

A2D = (l - ^) _^ w{r) + o{l/n), (4.14) 



(4.6) which implies 



(4.15) 



independently of the prior u'{r). This result agrees with 
the bound derived from the pointwise approach in the 
next section. 

The very same approach we have outlined can be ap- 
plied to 3D states, we just have to replace Vj by Vj [see 
Sec. Ill A and Eqs. (CI), (C2) and (C3)]. To next to 
leading order we have (see Appendix F for details) 



A 



3D 



/ dr w(r) 
Jo 



3 + 2t 
4n 



(4.16) 



Recalling that n = N/2, the asymptotic fidelity reads 

F''' = l-^-^ + oil/N), (4.17) 

where (r) stands for the mean purity over its prior dis- 
tribution, namely 

(r) = / drw{r)r. (4.18) 

Particularizing (4.17) to the Bures distribution, 
Eq. (C4), we have 



ip3D 
" Bures 



l-(M^)^ + o(l/iV). (4.19) 



V. ASYMPTOTICS: POINTWISE APPROACH 

In the Bayesian approach, described in the previous 
sections, both the measurement strategy and the esti- 
mator (or guess) — i.e., the estimation scheme — are so 
chosen as to minimize the average fidelity with respect 
to a given prior distribution for any N. In contrast, in 
the so called pointwise approach, to which this section 
is devoted, one's goal is to optimize the performance of 
a scheme at a fixed point, 6o, in parameter space (In 
this section wc will denote the parameters that specify 
the states by 6 and the guesses by 6, as is standard in 
statistics). 

The aim of this section is to present a bound on the 
quadratic cost, the so called quantum Cramer- Rao bound 
(QCRB), and its relation to the fidelity The QCRB is 
a matrix inequality which is in general non-attainable. 
However there is a related bound that one can expect 
to be saturated asymptotically: the Holevo bound. A 
scheme that attains this bound is asymptotically optimal 
from the pointwise perspective. 
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The pointwisc approach rcHcs on the fact that for 
large N only quadratic cost functions become relevant. 
By appropriate algebraic manipulations and averaging 
over the prior distribution one can compare this approach 
with the Bayesian one in the asymptotic limit. It is 
proved rigorously in [19] that the averaged Holevo bound 
leads to an asymptotic upper bound to the globally opti- 
mal fidelity for "smooth" qubit estimation problems, and 
for "smooth" pure state estimation problems. (We have 
a lucky coincidence for qubits, and for pure states, that 
fidelity can be expressed as a quadratic form in the esti- 
mation error of certain parameters of the state.) One can 
expect this bound to be asymptotically valid in general, 
but no rigorous proof has been given yet. 

As to whether or not the averaged Holevo bound is 
asymptotically saturated: there exist very good heuris- 
tic arguments that this should be true, but no rigorous 
proof. (Unpublished work of M. Hayashi: for large A'' the 
estimation problem can be approximated, around a point 
obtained by a preliminary rough estimate;, by a Gaussian 
state estimation problem, for which the Holevo bound is 
attained by an appropriate generalized heterodyne mea- 
surement). 

In Sec. HI A we derived the optimal global scheme 
for 3D states and showed that it is the same for any 
isotropic prior distribution. From the previous consider- 
ations we expect it also to be asymptotically optimal in 
the pointwise sense. We will show that this is indeed the 
case, since the optimal fidelity does coincide asymptoti- 
cally with the averaged Holevo bound. 

For 2D states the situation is more complex. Recall 
that the scheme defined by (3.37) is not optimal for ar- 
bitrary N and general isotropic priors. Nevertheless, 
Eq. (4.15) also coincides with the averaged Holevo bound. 
This comes close to a proof of the asymptotic optimal- 
ity of the scheme. A rigorous proof (see Appendix H) 
can be derived from the van Trees inequality [20] (the 
same inequality is used to get the more general results in 
[19]). Thus our approximate solution (3.37) is asymptoti- 
cally optimal both from the global and from the pointwise 
points of view. 

Both the 3D and the 2D cases confirm the conjectures 
that the averaged Holevo bound is a sharp asymptotic 
bound for fidelity, and that the global optimal scheme 
is also asymptotically optimal in the pointwise sense. 
Global asymptotic optimality does not depend on the 
prior or on non-local features of the figure-of- merit. 

Before stating the main results, we need to introduce a 
bit of notation. Let p be a density matrix parametrized 
hy 6 = {01,02, Op) € Q C W, where p is the number 
of parameters.^ Just as in the previous sections, let us 
assume we perform a generalized measiiremcnt O on an 
arbitrary state p{0). Recall that such measurement is 



2 In the 3D case p = 3, = (r, d, 4>) and G = [0, 1] X [0, tt] x [0, 27r). 
In the 2D case p = 2, = {r, 9) and 8 = [0, 1] x [0, 2-k). 



represented by a POVM O = {O^}, where x G labels 
the various outcomes. Let 6^ be the estimate (or guess) 
of 6 based on the outcome %, i.e., is a mapping from 
the outcome set O to the parameter space 6: 



e : n 

X 



e 



(5.1) 



A natural way of quantifying the performance of an esti- 
mator 9 and a measurement O at a point Gq is provided 
by the mean square error matrix (MSE) defined by the 
matrix elements 

Vc,0{0o,e) = E,„[{e^ - eoa){O0 - 600)] 

= Ef'(^l^o) (^'xa - ^Oa)(^x/3 ' ^0/^), (5.2) 

xen 

where the dependence on O is understood to simplify 
the notation and, naturally, p{x\0o) — tr [p(0o)Ot,-]. In 
the remaining sections of the paper [/] stands for the 
expectation value of / with respect to the probability 
distribution p{x\0o)- 

An estimator is said to be locally unbiased (LU) at Oo if 



= 5af3, 
e=eo 



(5.3) 



where da is shorthand for djdOa- Intuitively, these con- 
ditions mean that, on average, the estimator is close to 
the truth in a small nc;igliborhood of d{). When these 
conditions are satisfied for all possible values of Qq, the 
estimator is said to be uniformly unbiased, or, simply, 
unbiased. LU estimators play a fundamental role in the 
pointwise approach. 

The Fisher information matrix (FI) is defined as 



lam 



[dalnpix\0) dplnpixm 
daPixW) d^pixld) 



= E 



Pixm 



(5.4) 



Note that the FI depends on a specific measurement O, 
through the probabilities p{xW)- 

With the above few definitions we can already give a 
first important result: The Cramer-Rao bound (CRB). 
It states that the MSE of an estimator 6 LU at 6q is 
lower bounded by the inverse of the FI, namely, 



v{eo,e)>i{eo) 



(5.5) 



In spite of its fundamental character, the CRB has the 
drawback that the bound it provides refers to a particular 
measurement, not necessarily optimal. To go around this 
difficulty, some new definitions are required 

The symmetric logarithmic derivative (SLD), denoted 
by Xa{G) (recall that a = 1,2, .. . ,p), is defined as the 
self-adjoint matrix that satisfies 



daP{e) = 



p{e)Xa{e) + K{e)p{e) 



(5.6) 
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The SLDs for the 2D and 3D cases (2D and 3D models in 
pointwisc terminology) are given in Appendix G. With 
this we can now define the quantum Fisher information 
matrix (QFI) as 



ff,^(0) = Retrp(0)A„(0)A^(0). 



(5.7) 



E.g., for the two models studied in this paper the QFIs 
are 



3D- 











l-r2 

r2 
V r^sin^^y 



1 



,H2D=\ 1 







• (5.8) 



The second important result of this section, due to 
Braunstein and Caves [21], states that for a given model 
all FIs are bounded from above by the QFI, i.e., 

7(0o) < H{eo) for all {O^}, (5.9) 

from which it immediately follows the QCRB: 

^(6*0, e) > HiOoy^ for all {O^}- (5-10) 

Although these bounds are measurement-independent 
— they depend only on the signal states and the geomet- 
ric properties of the space they belong to — they have the 
drawback of not being always attainable. 

We have seen above that H{Oq) provides information 
on how small the variance of an estimator can be at 6q. 
There is still another remarkable property of the QFI that 
we will need below: its direct relation to the fidelity [14] . 
Indeed, from its definition [see Eq. (2.3)], 



one obtains 



(5.11) 



/(0o, eo + 5e) = i- -H„0{eo)6e„se^ + ..., (5.12) 

where the components of 66 are assumed to be small 
(neighboring states). Given a scheme, characterized by 
{{O^},0), the average of the fidelity over all possible out- 
comes is 

KofieJ) = ^tr[p(eo)Ox]/(^o,^x) 
xen 

= l-^TvH{e„)V{0„J) + ... . (5.13) 

Our aim is, therefore, to minimize the cost 

tvH{do)V{do,d). (5.14) 

An optimal measurement, Oopt, is thus the one that min- 
imizes (5.14). 

The formalism and results presented so far are com- 
pletely general and apply to any model, i.e., to any fam- 
ily of states p{d). We now need to introduce the so called 



A'^-copy model. It is defined by the set of density matrices 
{6) of the form 



(5.15) 



The "original" family, p{6), is sometimes referred to as 
the single-copy quantum model. Naturally, we can talk 
about the variance or MSE of an estimation of the N- 
copy model, which we denote by V'^ {6q, 0). It is not hard 
to convince oneself that the cost Eq. (5.14) of the optimal 
scheme necessarily scales as 1/A'', for large enough N It 
is well-known in classical statistics [22] that under some 
regularity conditions the maximum likelihood (ML) es- 
timator is asymptotically unbiased at Oq and its MSE 
is equal to I^{9q)~^, i.e., the ML estimator achieves 
the CRB asymptotically. It follows that for an optimal 
measurement tr H{6o)I^ {9q)~^ provides an attainable 
bound to the cost and it will scale as 1/A^ asymptoti- 
cally. This lower bound on (5.14) can be expressed as 



N 



+ o{l/N), 



(5.16) 



where = /N is called the normalized FI. Likewise, 
for the asymptotic fidelity we have 



1 



tr£r(0o)/^(^o)-^ 



AN 



+oil/N). 



(5.17) 



which means that our optimization problem amounts 
to finding a measurement Oopt that minimizes 
tv H{6o)I^ {do)~^ ■ We next present a powerful measure- 
ment-independent bound to this expression; the so called 
Holevo bound. 

Let G be a positive semi-definite matrix and 



min 

{(O on 

LU at 0, 



trGy^(6>o,^), (5.18) 







where the minimization is over all pairs (O, 6) of mea- 
surements on p^{9) and estimators for which the latter 
is LU at 9q (the unbiasedness of an estimator depends 
on the measurement through its outcome probability dis- 
tribution). Eq. (5.18) is relevant to the problem we are 
dealing with because its right hand side can be shown to 
give the 1/A^ term in (5.16) and (5.17) if G = H{9o). In 
Ref. [1] Holevo proved the following bound: 



Cl{G) > C"{G) 



(5.19) 



^ Just consider a scheme consisting of N identical measurements 
on each copy p{0). By definition the cost of the optimal scheme 
is less than or equal to the cost of the former, which obviously 
scales as 1/N. This sets a bound on the cost of the latter that 
also scales as 1/N. 
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where 



C«^(G) = min hrGReZlX] 

+ tr VGlmZ[X]VG |. (5.20) 

In this expression X = {Xi,X2, . . . ,Xp) are hermitian 
matrices satisfying the following relations 



trp(0o)X„ = 0, 

tvdaP{9o)X^ = 5a0. 



(5.21) 
(5.22) 



set Se„ of all 



The minimization in (5.20) is over the 
such X. Finally, Z[X] is the p x p matrix whose ele- 
ments are given by 



z„p[x]^tTpie„)x^X[, 



(5.23) 



Although the Holevo bound (5.19) is not attainable 
but for a few simple exceptions, unpublished work by M. 
Hayashi shows that it is asymptotically attainable, i.e.. 



Jim NCZiG) = C^^iG) 



N- 



(5.24) 



as previously mentioned in this section. It is important 
to point out here that practical use of Hayashi's con- 
struction would require a two-step measurement in order 
to saturate the bound. This is necessary because the 
optimal measurement and LU estimator at 9q depend 
themselves on Oq, which we do not know beforehand. To 
overcome this difficulty, one takes an asymptotically van- 
ishing fraction of copies, say \/N, and makes an initial 
estimate of the parameter Oini- Then, on the remaining 
copies one performs the measurement that is optimal at 
Oini- Therefore, (5.17) and (5.24) lead us to expect that 
the optimal asymptotic fidelity is given by 



E«„/^(0o,0ml) = 1 



±C^^[H{9o)]+o{l/N). (5.25) 



We next apply these results to the 3D and 2D models. 



1. Holevo bound for the 3D case 

In this case p = 3 and it is not hard to show (see Ap- 
pendix G) that there is only one "vector" of matrices 
X = {Xr, Xg, X^) in Sfl and no minimization is thus re- 
quired in (5.20). The Holevo bound is straightforwardly 
computed to be 



C"[H{eo)]=S + 2r, 



(5.26) 



and (5.17) becomes 



Eeo/^(^0,^ML) = 1 



3 + 2r 
4N 



■o{l/N). (5.27) 



Furthermore, we expect this result to hold regardless on 
whether the ML estimator or the optimal guess is used. 



This implies that for a "well behaved" prior, one should 
have (4.17) by simply averaging (5.27), and we re-obtain 
the result of the the preceding section, which was com- 
puted using the Bayesian approach, with much less effort. 
Eq. (5.27) was also obtained by Matsumoto and Hayashi 
[12] with an estimation strategy similar to the one devel- 
oped in Section III A. 



2. Holevo bound for the 2D case 

In the 2D model the SLDs satisfy 

Imtv piOo) XM Xfsieo) = 0. (5.28) 

It is not difficult to check that in this situation the QCRB 
is asymptotically attainable,^ i.e.. 



C^(G) - tTGHiOo 



(5.29) 



Indeed, the choice Xa = Y^p H^^{6o)X/3{9o) achieves 
this. Hence C^^[H{eo)] = 2 and 



E«„r(0,0ML) = l-^ + o(l/iV), 



(5.30) 



from which (4.15) follows for "well behaved" priors. This 
strongly supports the claim that the 2D measurement 

scheme defined by Eq. (3.37) is indeed asymptotically 
optimal. The Appendix H contains the rigorous proof. 



VI. CONCLUSIONS 

We have presented a detailed analysis of the optimal 
estimation of qubit mixed states given a number N of 
identical copies. Our results apply to arbitrary N, finite 
or asymptotically large. 

For general states (3D) we have obtained that the 
structure of the optimal measurement is based on the 
decomposition of the signal states in irreducible blocks 
under the action of the symmetric group. The scheme is 
essentially unique, valid for any isotropic prior distribu- 
tion and any number of copies. This optimal scheme has 
the nice property that it can be regarded as two indepen- 
dent protocols performed sequentially: that for estimat- 
ing the purity r of the state and that for estimating its 
orientation n in the Bloch sphere. It turns out that the 
estimation of the purity only exploits rotationally invari- 
ant properties of the signal states, and a measurement of 
the Casimir operator = + Ija is optimal. In 

other words, the estimate of r only depends on j, which 
characterizes the SU(2) invariant subspaces. This should 



^ A theorem by Matsumoto [12] states that the QCRB is asymp- 
totically attainable if and only if (5.28) holds. 
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not come as a surprise since the purity itself is rotation- 
ally invariant and so are the priors considered here. The 
estimation of the orientation is formally equivalent to 
pure state estimation with 2j copies. As an illustration 
of our procedure, we have obtained closed expressions of 
the fidelity for the particularly important Bures prior. 
Results for other priors can be easily obtained with the 
techniques presented here. 

In 2D, if one wants to do precisely optimal estimation 
for any N, there is a subtle interplay between the es- 
timation of the purity and the estimation of the phase 
and they are no longer independent, although they are 
asymptotically so. Also contrasting with 3D is that the 
structure of the optimal POVM depends on the prior. 
The roots of this unconventional behavior lie in the dif- 
ferent group structure of 2D states. Here the relevant 
group is U(l) [instead of SU{2)] and j is not the only 
invariant; the magnetic number m is also invariant un- 
der U(l). Actually, the interplay purity-phase can be 
traced back to this symmetry property. In spite of these 
difficulties, we have reduced the problem of obtaining the 
optimal POVM for any isotropic prior to a rather triv- 
ial maximization problem [recall Eq. (3.34)]. We have 
also obtained a prior independent POVM that is indis- 
tinguishable from the optimal one for any practical pur- 
poses. Furthermore, it separates purity and phase esti- 
mation exactly for all N and is asymptotically optimal. 

The asymptotic behaviour of the estimation procedure 

has also been a central issue of our work. The asymptotic 
fidelity in 3D has the simple form F = l-{3+2{r))/ {4N), 
where (r) is the mean purity with respect to the prior. 
This result is proved here for isotropic priors within our 
Bayesian approach. It is worth emphasizing that so far 
the asymptotic expression was only known for the par- 
ticular case of the Bures prior [8]. In 2D, the asymptotic 
fidelity computed with the fixed POVM described above 
is simply F = 1 — 1/{2N), independently of the prior. 

We have studied the asymptotic behavior also from the 
pointwise approach, which is far more common among 
statisticians. The main advantage of the pointwise ap- 
proach over the Bayesian one is that it provides bounds 
on the asymptotic mean square error (as well as on any 
other quadratic loss function) that can be easily com- 
puted. These bounds correspond, by second order ex- 
pansion of the figure-of- merit, to bounds on the average 
fidelity which can be shown to be rigorous in many cases 
([19]), including those studied in this paper. The draw- 
back of the approach is that though one can heuristically 
expect these bounds to be asymptotically sharp, and one 
can propose two-stage measurement schemes which can 
be hoped to do the job, a lot of hard work is needed in 
each case to prove that they can be achieved. In contrast 
with the 3D case where all the results we have worked out 
from the Bayesian approach are rigorous, the optimality 
in the asymptotic regime of the 2D estimation scheme 
defined by (3.37) or (3.38) required some further work. 
Here we used the pointwise approach to fill the gap. The 
application of the van Trees inequality [20] to 2D in Ap- 



pendix H yields the asymptotic boimd on the fidelity in 
a particularly elegant and straightforward way. In turn, 
this bound provides the optimality proof. 

Altogether, the fact that the results obtained from 
the pointwise approach coincide with those derived 
from the Bayesian framework give further strong sup- 
port for the heuristic principle that the averaged lower 
bound from the pointwise approach is an asymptotically 
sharp lower bound for the global approach; and moreover 
that the chosen prior distribution and to a lesser extent, 
figure-of-merit, has asymptotically little impact on the 
behaviour of the solution. 

There are two extensions of our work that can be read- 
ily addressed. Here, we have considered the full esti- 
mation of a qubit mixed state, however for some appli- 
cations only partial knowledge of the state, such as its 
purity or its orientation, may be required. The tech- 
niques developed in this work can be easily adapted to 
these situations (see [18] and [23]). A second line of 
work concerns the use of more realistic measurements, in 
particular those that can be implemented with current 
technology. In this work we have considered the most 
general measurements allowed by Quantum Mechanics. 
They yield the maximum theoretical accuracy that can 
possibly be achieved, and thus provide a bound (and a 
measuring rod) for the accuracy of any other estimation 
scheme. However, they involve joint operations on the 
whole sample of states that in general are difficult to 
implement in a laboratory. It is thus of great practical 
relevance to study sclic^iiic^s based on local von Neumann 
measurements. Preliminary results, were presented in [8] . 
There, it was found that, for some tomographic schemes, 
the rate at which the fidelity approaches unity for a Bu- 
res prior distribution is 1 — F ~ i.e., there is 
a qualitative difference with the optimal measurements. 
Present work in progress suggests that by using classical 
communication the precision rate can be similar to the 
optimal collective scheme 1 — F ~ 1/-/V, but the coeffi- 
cient of the 1/A'^ term is strictly larger than the optimal 
one, and corresponds to the result from the pointwise 
approach obtained in [3] . 
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APPENDIX A: BLOCK-DIAGONAL FORM 

OF p^"" 



APPENDIX B: THE MULTIPLICITY OF THE 
REPRESENTATION j 



One may use the symmetric group Sn to write p*^^ in 
the block-diagonal form (2.10), much in the same way as 
it is used to obtain the SU(2) Clebsch-Gordan decompo- 
sition 



^li 

=0,1/2 



(J = N/2) 



(Al) 



(the multiplicity, rij, is computed in Appendix B). How- 
ever, at variance with the SU(2) case, where all Young 
frames have a single row, we here must also consider those 
with two rows, because 



det p 



(A2) 



(instead of unity). Hence, each two-box column of a 

frame contributes a multiplicative factor det p. 

With this observation, one can easily obtain the ex- 
pression of the blocks pNj as follows. A generic Young 
frame with N boxes has the shape 



-j columns 



2j columns 



(A3) 



Each of the N/2 — j double columns gives a factor det p. 
The remaining 2j single columns correspond to a fully 
symmetric tensor on which SU(2) acts irreducibly. In the 
basis of the irreducible subspace of the representation j, 
this tensor can be written as the matrix which we denote 
by pj. Hence 



Using Young tableaux techniques, there is a simple way 
to compute the multiplicity iij, (2.11), with which the 
representation j shows up in the Clebsch-Gordan decom- 
position of (1)®^ (this tensor product is denoted by □^■'^ 
in the present context). 

The Young frame in (A3) can be denoted by A = 
[Al , A2] = [A^/2 + j, N/2 - j] (this is a standard nota- 
tion where A^ is the number of boxes in the fc-th row of 
the frame). This very same frame (A3) is equivalent to 
a single row of 2j boxes, i.e., to [2j], which denotes the 
representation j of SU(2). 

The recipe for computing SU(2) Clebsch-Gordan de- 
compositions [24] applied to \zf'^ amounts to the follow- 
ing. First label N boxes each with an integer number 
from 1 to N. Then, starting with box number one and 
proceeding sequentially, build (and keep account of) all 
possible Young tableaux such that (i) they have at most 
two rows and (ii) the full sequence of integers formed by 
reading right to left in the first row and then in the sec- 
ond is admissible.^ The number of occurrences of (A3) is 
precisely Uj. But the very same recipe gives us all stan- 
dard Young tableaux'' of shape A = [N/2 + j, N/2 - j]. 
Hence rij equals the number, f\, of such tableaux. 

Recalling the Frobenius determinantal formula [25], 



/A = iV! 



1 



Xk - k + l 



we get 



nj = m 



N/2+J 

1 



N/2+j + l 
1 



(Bl) 



(B2) 



N/2-J-I N/2-j 

This determinant is readily seen to give (2.11). 



PNj 



1 - r 



Pj. 



(A4) 



We now note that for f = rz the matrices p®^, pNj 
and Pj, are all them diagonal and can thus be obtained 
without much effort. The result is 



I — r\ / I + 



2 J \ 2 
For arbitrary f covariance implies 



\jm){jm\. (A5) 



l-r 



1 + r 



2 J \ 2 

m=-j 

U{n)\jm){jm\U^{n). 



j+m 



(A6) 



Notice that, in spite of what the notation might sug- 
gest, the matrices pj are not proper density matrices, as 
trpj ^ 1. 



APPENDIX C: CLOSED EXPRESSION OF THE 
FIDELITY USING A BURES PRIOR IN 3D 

The explicit expressions of the coefficients Vj, Vj 



[Eqs. (3.19) and (3.18)] are 
v° = 2 J dr 



1 w{r) fl-r''V~'^^ A + r' 



r V 4 



and 



(CI) 



(C2) 



® A sequence of integers p,q,r . . . is admissible if at any point in 

the sequence at least as many I's have occurred as 2's, at least 
as many 2's have occurred as 3's, etc. 
® A Young tableaux is said to be standard if its labels increase 
from left to right along the files and from top to bottom along 
the columns. 
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with 



Vj 



i + 1 



' dr 



^ w{r) / 1 — r 



2 \ --^-i 



w{r) /I — r 



2 \ -f-J 



1 + r 



1 + r 



(C3) 



To obtain these expressions we have recalled (2.12) 
and (2.13) and defined w{r) = w{—r) for — 1 < r < 
to extend the r- integration to the interval [—1, 1]. 

Consider now the Bures prior [14], which is commonly 
regarded as the natural uniform distribution in the Bloch 
sphere, since it follows from the metric induced by the 
fidelity [9, 10]. It is given by 



4 r'^dr 

dp= 7== dn, 

TT VI - 



(C4) 



which implies w{r) = (4/7r)r^(l — r^)^^/^. In this case 
the integration in (CI) and (C3) can be performed ana- 
lytically. For simplicity, we will consider an even number 
of copies N = 2n {J = n). By making extensive use of 



r I 1 — r\ I 1 + r\ p — a 



-1 2 V 2 
where 



B{a,/3) = 



f3 + a 
r(a)r(/3) 



B{a,f3), (C5) 



r(a + /3) 

is the standard Euler Beta function, we obtain 

8d,- 



(C6) 



B{n-j + l,n + j + 2). (C7) 



^ 7r(2n + 3) 
Similarly, 



7r(2n + 3)' 



and 



4d, 



Ui = 



^ 7r(2n + 2) 
which lead to 



B(n-i + i,n + i + |) 



(C8) 



(C9) 



. ^ Hi r(n-j + l)r(n + j + |) 

^ TT r(2n + 4) ■ ^ ' 

Putting the various pieces together we finally obtain the 
closed expression: 



2(2j + 1)2 



TT ^ (2n + 3)(2n + 2)(2n + 1) 



1 + 



j r(n-j + i)r(n+j + |) 



^ + j + l r(n-j + l)r(n + j + l) 



• (Cll) 



APPENDIX D: COVARIANT POVMS FOR 
2D STATES 

For the sake of completeness, in this appendix we give a 
simple proof specialized to the 2D case of a more general 
result concerning the optimality of covariant (continu- 
ous) POVMs [1]. More precisely, we wish to prove that 
for any given POVM, {O^}, there is always a covariant 
(continuous) one, with elements 



(Dl) 



which gives the same average fidelity for a suitable posi- 
tive operator fi^. The proof goes as follows. 

In the 2D case the average fidelity can be written as 
(in this section the integration limits and 27r are un- 
derstood) 



d9_ 
2^ 



f{e^-e,R^)tv[p{e)0: 



(D2) 



where (9) is the angle between (r) and the x-axis, 
and we denote the fidelity by ,f(0^ — 9, R^) to emphasize 
the fact that in 2D it is a function of the difference of 
these two angles. Note also that we drop the explicit 
dependence on r which does not play any role in the 
proof. Thus, e.g., we denote the mixed state p{'r} simply 
as p{9). Proving our statement amounts to proving that 
the POVM with elements and associated guess given by 



O 



x4> 



U{<p-9^)0^uH<f'-e^)'^^,R^ 



(D3) 



gives the same fidelity as {O^}. Note that (D3) de- 
fines fl-)^ in (Dl) through 



= u{ct>) [u\e^)o^u{e^)] uH^) 

= U{ct>)Q^U\ct>). 



(D4) 



In formulae we wish to prove that F = F, where 

^ = E/ ^^/('/>-^'^x)tr[p(^)Ox0] (D5) 

X 

is the fidelity we obtain with {O^^}. We also have to 
prove that {O^^} in (D3) is indeed a POVM, namely, 
that 



(D6) 



Let us start by proving (D6). We simply change vari- 



ables 



and use the invariance of the U(l) 



Haar measure, which in this case is the trivial identity 
d(j) g{4>) = Jq^ d(j)g{4> + a) satisfied by any periodic 
function g of period 27r. We have 

^U{,l>')U\cl>') = 1. (D7) 
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We use the same logic to prove that F = F: 

X 

X tr[p(^?)C/(</,)O^C/t(0)]. 



(D8) 



We now use that {(f))p{d)U{4') = p{9 — (p) and make 
the change of variable 6 ^ 6 — (j) to obtain F = F. 

UR^= R for all x (this is the case if the estimation of r 
is entirely based on j, as in the last part of Section III B), 
we can replace the POVM elements by 



X 

This is equivalent to 

= u{<i>)nuH<i>), 



(D9) 



(DIO) 



where the positive operator fl can be expressed in terms 
of in (Dl) simply as = X^^i^x- proof 
that achieves the same fidelity is straightforward and it 
amounts to pulling the sum over % into or out of the 
trace in Eqs. (D5) and (D8), which we are entitled to 
do because we are assuming that is now independent 
of X- 

Using the results in Ref. [17], it is easy to show that 
for any given covariant (continuous) POVM with ele- 
ments given by (Dl) there is always a POVM with a 
finite number of elements O^^ = U{4>a)^U^ {(pa), a = 
0,1,2, ... M — 1, which achieves the same fidelity for a 
suitably large M. The angles (pa can be chosen to be 
(pa = 2na/M, a = 0, 1, 2, ... M - 1. 



APPENDIX E: COMPUTATION OF THE 



We note that the two coefficients and cj j^ are bi- 
nomial sums modulated by smooth functions of m in a 
neighborhood of m = 0. More precisely, 



4= E 



2j \ 1 



j - m I 



.^fc(m), (E3) 



where (pk{m), which can be read off from (E2) for k = j, 
j — 1, can be Taylor-expanded at m = 0. For large j this 
expansion is 

Mm) = ^-^-- + ^2+oir'/% 

2m (2 3 



(Pj_i(m) = -r + 7 - 72 ™ 



J \J J 

o 3 4 

zm m 



J r 



(E4) 



Here the power counting is done by noticing that m is 
order -y/j, since the sum 



m=-j 



2j \ m'^ 



\j-mj 2"^^ 



(E5) 



IS 

C)(j9/2) for 

q even and vanishes for q odd, as is well 
known. In particular, we have Sq = 1, S2 = j/2, S4 = 
i(3j - l)/4. 

With all this information we obtain c^- = 1 — l/{4j), 
c^j_-^ = 1 — 3/(4j), and finally have 



cL = 4 + {4-4_,)im-j) + 0[{m-j)'] 
= 2(^-27j + 27 + '^[("^-^-)]- 



(E6) 



APPENDIX F: EXPLICIT COMPUTATION OF 
THE ASYMPTOTIC FIDELITY 



In this Appendix we give an approximation to c^, de- 
fined in Eq. (3.42), of the form ~ a j + ^jijn — j) valid 
for j large enough. 

Recalling the Wigner formula 



d^^,(e) = ^{j+rn)\{j-m)\{3+m')\{j-m')\ 



X 



(-1)* (cos|)'^+'"'-"-'' (-sin I) 



(i — m, — iV.a -\- m! — 



i=0 
we obtain 



{j — m — iy.{j + m' — i)\{i + m — m')\i\ 



J ~ ^ \ j - m / y 7 -i-m-l- 1' 

J-mJ y j + m + 1 22j-ij 



2j 



j-m 



4-. = E 



m=-j 



j — m m(l + m) 



(El) 



(E2) 



Here we present with some detail the procedure we 
have used to evaluate the sum of (4.13) in the large A'' = 
2n limit. We first focus on 2D states and later comment 
on the main differences with 3D. 

In the two cases, we write nj as the right hand side of 
the identity 



dj / 2n 
n+j+1 \n-j 



2n 
n-j 



The 2D case 



2n 
n + j + 1 



(Fl) 



After plugging Eqs. (3.43) and (4.2) into Eq. (4.13), 

we have 



A2D=E 

j=0 



2n 
n-j 



2n 
n+j + 1 
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w{r) 



— (r — > — r) 



;i , wir) 



/ 



"1 








n 





1-r 



1 + r 



— (r — > — r) 



(F2) 



We next multiply the powers of (1 ± r)/2 that are ex- 
plicitly given in this equation by the first binomial. Like- 
wise, we multiply those denoted by (r — r) by the 
second binomial. In the resulting expressions, we next 
change the summation indexes according to n — j = k 
and n + j + 1 = k, respectively, and do similar changes 
in the remaining crossed terms. After some algebra, we 
have 

1 ^ w{r) (1 + r 
A2D = - / dr^ \ — - 
n Jo r [ 2 

n 2n 

Bk{r)^k{r) + Yl Bk{r)^k-i{r) 

k=0 k=n+l 

1-r, 



\r —rl 



where Bk{r) is defined by 

'2n 



Bk{r) = 



1 — r\ /1 + r 



(F3) 



(F4) 



and 



$fc(r) = s/k{2n - A;)(l - r^) + (n - k)r - -. (F5) 

Since the coefficients Bk{r) are the terms of a binomial 
series, for large n only those for which k « n(l — r) < n 
(or equivalently 2n — k > n) give a significant contri- 
bution to the fidelity, whereas the rest fall off expo- 
n entially with n. Thi s enables us to expand the factor 
■\/{k — l)(2n — k+1) in $fe_i(r) as a power series in 1/fc 
and 1 / (2n — k) and obtain the relation 



$fe_i(r) = $fc(r) + 



k 



2n-k 



2n-k 



k 



+ r + o{l/n) (F6) 



which we use in the second sum of (F3). We further 
define ^k{r) = $fe_i(r) - $fc(r) + o(l/n). It satisfies 
^'fc('') = ^'^2n-k{—r), as can be read off from (F6). 

The leading contributions come from the terms that 
contain $fc(r), and the corresponding term in [r — > — r]. 
They combine into a single sum from k = to k = 2n. 
The rest of the terms [those proportional to \E';c(r) and 



r)] are subleading and can be simplified using the 
change of indexes k ^ 2n — k. The result can be cast as 



AsD = - / 

n Jo 



dr 



w{r) 



1 



2n 
fe=0 



Bk{r)^k{r) 



+ 



fc=0 



Sfe(r-)*fc(r) 



(F7) 



We readily see that the first sum (as well as the corre- 
sponding one obtained by the substitution r — r) is a 
binomial sum modulated by the function $fc(r), analo- 
gous to (E3) in Appendix E, and can be computed along 
the same line. This sum is peaked at fc w n(l — r), as we 
have already mentioned, which suggests expanding $fc(r) 
in powers of fc — n(l — r). More precisely, one can check 
that 



^ / X 1 [k-n{l- r)] 

4 2n(l - r^) 



+ o(l/n) (F8) 



[the power counting is simply k — n{l — r) — 0{^yn)]. Re- 
calling that the lowest moments, Sq{r) = J^^o Bk{i")[k— 
n{l — r)]'^, of the binomial series given by (F4) are 
So{r) = 1, S2{r) = (n/2)(l - r^) we obtain 



2n 

E 

fe=0 



Bk{r)^kir)=n- 1/2 + o{l/n). 



(F9) 



To evaluate the second sum in (F7) we use again the 
approximation Bk{r) S[k — n(l — r)] [see Eq. (4.5) 
and the comments below it] , along with the substitution 
X^^Zq — > n ds, where s = k/n. This yields 



n-l 

E 

fc=0 



Bk{r)^k{r)=0{l/n). 



(FIO) 



The counterpart of (FIO) in the term denoted by [r 
—r], Eq. (F7), gives no contribution since d[k — n{l + 
r)] lies outside the s-integration range. Collecting the 
various pieces we finally obtain 



A2D = I 1 



2n) / 



drw{r) + o(l/n). 



(Fll) 



The 3D case 



The 3D case is quite similar. Our starting point is now 
Eqs. (CI) and (C2). We proceed as above to obtain 



i2j + l)r - 1 



n+j+l 



2(J + 1) 
— (r ^ — r) 



1 -r 



(F12) 
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and a similar expression for Vj. From them we com- 
pute A3D to be 



i3D = 



1 " 



2n 
n-j 



^ w{r) 
{2j + l)jr-j' 



2n 
n + j + 1 



20+1) 



n-j 



n+j+1 



(F13) 



This expression can be cast in the form of (F3), where 
now 



$fe(r) = ^/k{2n - k){l - r"^) + (n - k)r 

1 ^v? - k{2n - k) 

2 - k{2n~k) + l' 



(F14) 



One can check that \l/fe(r) is again defined by (F6) and 
AsD can thus be expressed in the form (F7). The first 
sum is again Taylor-expanded about k = n{l — r). Using 
the moments of the binomial series defined by (F4) and 
keeping only the relevant order we obtain 



n-^^t-^+0(l/n). (F15) 



fc=0 



Note that we cannot drop the absolute value since the 
integral over r extends to the interval [—1,1] [see, e.g. 
Eqs. (F12) and (F13)]. 

To evahiate the second sum in (F7) we proceed as in 
the previous 2D case, and find that (FIO) still holds. 
Finally, we obtain 



A 



3D 



/" 

Jo 



drw{r) ( 1 



3 + 2r 
An 



o{l/n). (F16) 



APPENDIX G: SLDS AND C"[H{eo)] FOR THE 
3D MODEL 

The SLDs of the 3D model can be calculated to be 
_ 1 t + n-a 1 t-n-a 



l + r 2 
\e = r den ■ a, 
Xs = r dd>n ■ u. 



l-r 



(Gl) 

(G2) 

(G3) 

[In this appendix we drop the arguments 6 = {r,6,^) 
and 6q wherever no confusion arises.] The two SLD of the 
2D model. A,, and , are obtained by simply setting 9 = 
7r/2 and then replacing (p by 9 in the above expressions. 

To compute (H) we first need X — {Xr, Xg, X^), 
which are completely fixed by the conditions 

xl 



ivpXc, = 

tidapXp = Sal3 



(G4) 
(G5) 
(G6) 



Hermiticity, Eq. (G4), requires 

Xa = a^l + 6a • a, a = r,6, 
The conditions (G5) yield 

aa + ba ■ n = 0, 
and conditions (G6) give 

br = n, 

be 

r 

1 



- den, 
r 



b^ = 



rsm 



(G7) 

(G8) 

(G9) 
(GIO) 

(Gil) 



These together with (G8) imply Ur = —r, ae = 0, 
and = 0. Hence, the only set of matrices satisfy- 
ing (G4-G6) is 



Xj. = —r 1 + n ■ a, 



Xe = - den ■ a, 

r 



(G12) 



Xj, = 



1 



rsin 6 



d^n ■ d. 



To compute the Holevo bound we only need to take 
traces of the form ti pXaX/j. A straightforward calcula- 
tion gives 



ReZ[X] = H,^ 
lmZ[X] 



/Q 




1 



rsin^ 



r sin^ 




(G13) 



(G14) 



Therefore 



tri?3Di?3n = 3, 



tr 



^/H^lmZ[X]^/H^ 



2r, 



(G15) 
(G16) 



and we obtain (5.26). 



APPENDIX H: VAN TREES ASYMPTOTIC 
BOUND FOR 2D STATES 

Let 6 be the column vector of the two real parame- 
ters r and 9 of Sec. IIIB, which we use to parametrize 
the states on the equatorial plane of the Bloch sphere. 
Define i/>(0) = ^r{0) wher e r is the four- dimensional 
real vector (of length 1) introduced in Sec. II. By (2.4) 
we can now write 



i-f{do,e) = \\ip{eo)-ip{e)\\' 



(HI) 
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showing that one minus the fideHty is the squared L2 cost 
function for estimating xp. Taking the two states close to 
one another, and comparing with (5.12) shows that 

^'^i>' = \h (H2) 

where ip'{9) denotes the 4x2 matrix of partial derivatives 
of tp with respect to components of G and H is the QFI. 

Let = /N denote the normalized FI for 6 based 
on an arbitrary collective measurement on the N copies, 
and let 6 denote an arbitrary estimator of 6 based on that 
measurement. By we denote averaging over with re- 
spect to a prior probability density w over the equatorial 
plane. Then the van Trees inequality [20] states that, for 
any given matrix function C{d) of size dim(i/)) x dim(0), 
and under certain smoothness conditions on the proba- 
bility distribution of the outcome of the measurements 
and on the prior w, 

E.trCI-CT + ^E.(^^g^' 

where by {wC)' we denote the column vector of the 
same length as ip, with row elements J2i3 di3[w{6)Ci /^{O)]. 
By the Helstrom information inequality (5.9) we may 
bound in the denominator by H (of the single-copy 
model). Without the term in the denominator, 

the optimal choice of C would be C = tj:'H~^. Making 
this choice anyway gives 



{E^tv^'H'^xP'^)^ ^^^^ 

Hence, provided the second term in the denominator is 
finite, by further substituting ip'^ ip' = and letting A'' 
converge to infinity, we obtain 

liminfArE^E0(l-/(6',^)) > ^. (H5) 

N—^oo 2 

The van Trees inequality requires some modest 
smoothness of the probability density of the measure- 
ment outcomes as function of 9, which arc satisfied in 
our case since the density matrix p'^^{d) is a smooth 
function of 0. It requires smoothness of the prior den- 
sity w and also that this density converges to zero at the 
boundary of its support. This last property does not hold 
for the priors in which we are interested. However, for 
a given prior w and for given e > one can construct a 
prior We which is zero outside a circle of radius strictly 
smaller than 1, which converges smoothly to zero at the 
boundary of its support, and which is everywhere smaller 
than (1 + e)w. The modification of w can simultaneously 
be done ensuring that the second term in the denomina- 
tor of (H4) is finite. Since 

we can first derive (H5) with w replaced by w^, then 
let e — > 0, resulting in (H5) with the original w in place. 
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